尽管在机器学习安全方面进行了大量的学术工作,但对野外机器学习系统的攻击的发生知之甚少。在本文中,我们报告了139名工业从业人员的定量研究。我们分析攻击发生和关注,并评估影响影响威胁感知和暴露的因素的统计假设。我们的结果阐明了对部署的机器学习的现实攻击。在组织层面上,尽管我们没有发现样本中威胁暴露的预测因素,但实施防御量取决于暴露于威胁或预期的可能性成为目标的可能性。我们还提供了从业人员对单个机器学习攻击的相关性的答复,揭示了不可靠的决策,业务信息泄漏和偏见引入模型等复杂问题。最后,我们发现,在个人层面上,有关机器学习安全性的先验知识会影响威胁感知。我们的工作为在实践中的对抗机器学习方面进行更多研究铺平了道路,但收益率也可以洞悉监管和审计。
translated by 谷歌翻译
尽管机器学习在实践中被广泛使用,但对从业者对潜在安全挑战的理解知之甚少。在这项工作中,我们缩小了这一巨大的差距,并贡献了一项定性研究,重点是开发人员的机器学习管道和潜在脆弱组件的心理模型。类似的研究在其他安全领域有助于发现根本原因或改善风险交流。我们的研究揭示了从业人员的机器学习安全性心理模型的两个方面。首先,从业人员通常将机器学习安全与与机器学习无直接相关的威胁和防御措施混淆。其次,与大多数学术研究相反,我们的参与者认为机器学习的安全性与单个模型不仅相关,而在整个工作流程中,由多个组件组成。与我们的其他发现共同,这两个方面为确定机器学习安全性的心理模型提供了基础学习安全。
translated by 谷歌翻译
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.
translated by 谷歌翻译
Local patterns play an important role in statistical physics as well as in image processing. Two-dimensional ordinal patterns were studied by Ribeiro et al. who determined permutation entropy and complexity in order to classify paintings and images of liquid crystals. Here we find that the 2 by 2 patterns of neighboring pixels come in three types. The statistics of these types, expressed by two parameters, contains the relevant information to describe and distinguish textures. The parameters are most stable and informative for isotropic structures.
translated by 谷歌翻译
Many dialogue systems (DSs) lack characteristics humans have, such as emotion perception, factuality, and informativeness. Enhancing DSs with knowledge alleviates this problem, but, as many ways of doing so exist, keeping track of all proposed methods is difficult. Here, we present the first survey of knowledge-enhanced DSs. We define three categories of systems - internal, external, and hybrid - based on the knowledge they use. We survey the motivation for enhancing DSs with knowledge, used datasets, and methods for knowledge search, knowledge encoding, and knowledge incorporation. Finally, we propose how to improve existing systems based on theories from linguistics and cognitive science.
translated by 谷歌翻译
Attention-based multiple instance learning (AMIL) algorithms have proven to be successful in utilizing gigapixel whole-slide images (WSIs) for a variety of different computational pathology tasks such as outcome prediction and cancer subtyping problems. We extended an AMIL approach to the task of survival prediction by utilizing the classical Cox partial likelihood as a loss function, converting the AMIL model into a nonlinear proportional hazards model. We applied the model to tissue microarray (TMA) slides of 330 lung cancer patients. The results show that AMIL approaches can handle very small amounts of tissue from a TMA and reach similar C-index performance compared to established survival prediction methods trained with highly discriminative clinical factors such as age, cancer grade, and cancer stage
translated by 谷歌翻译
Nucleolar organizer regions (NORs) are parts of the DNA that are involved in RNA transcription. Due to the silver affinity of associated proteins, argyrophilic NORs (AgNORs) can be visualized using silver-based staining. The average number of AgNORs per nucleus has been shown to be a prognostic factor for predicting the outcome of many tumors. Since manual detection of AgNORs is laborious, automation is of high interest. We present a deep learning-based pipeline for automatically determining the AgNOR-score from histopathological sections. An additional annotation experiment was conducted with six pathologists to provide an independent performance evaluation of our approach. Across all raters and images, we found a mean squared error of 0.054 between the AgNOR- scores of the experts and those of the model, indicating that our approach offers performance comparable to humans.
translated by 谷歌翻译
Mitotic activity is key for the assessment of malignancy in many tumors. Moreover, it has been demonstrated that the proportion of abnormal mitosis to normal mitosis is of prognostic significance. Atypical mitotic figures (MF) can be identified morphologically as having segregation abnormalities of the chromatids. In this work, we perform, for the first time, automatic subtyping of mitotic figures into normal and atypical categories according to characteristic morphological appearances of the different phases of mitosis. Using the publicly available MIDOG21 and TUPAC16 breast cancer mitosis datasets, two experts blindly subtyped mitotic figures into five morphological categories. Further, we set up a state-of-the-art object detection pipeline extending the anchor-free FCOS approach with a gated hierarchical subclassification branch. Our labeling experiment indicated that subtyping of mitotic figures is a challenging task and prone to inter-rater disagreement, which we found in 24.89% of MF. Using the more diverse MIDOG21 dataset for training and TUPAC16 for testing, we reached a mean overall average precision score of 0.552, a ROC AUC score of 0.833 for atypical/normal MF and a mean class-averaged ROC-AUC score of 0.977 for discriminating the different phases of cells undergoing mitosis.
translated by 谷歌翻译
Modern machine learning models are often constructed taking into account multiple objectives, e.g., to minimize inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models and the approximation of the Pareto front is used to assess their performance. However, when estimating generalization performance of an approximation of a Pareto front found on a validation set by computing the performance of the individual models on the test set, models might no longer be Pareto-optimal. This makes it unclear how to measure performance. To resolve this, we provide a novel evaluation protocol that allows measuring the generalization performance of MHPO methods and to study its capabilities for comparing two optimization experiments.
translated by 谷歌翻译
The field of natural language processing (NLP) has grown over the last few years: conferences have become larger, we have published an incredible amount of papers, and state-of-the-art research has been implemented in a large variety of customer-facing products. However, this paper argues that we have been less successful than we should have been and reflects on where and how the field fails to tap its full potential. Specifically, we demonstrate that, in recent years, subpar time allocation has been a major obstacle for NLP research. We outline multiple concrete problems together with their negative consequences and, importantly, suggest remedies to improve the status quo. We hope that this paper will be a starting point for discussions around which common practices are -- or are not -- beneficial for NLP research.
translated by 谷歌翻译